-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement grading strategies for peer evaluations #2196
feat: implement grading strategies for peer evaluations #2196
Conversation
Thanks for the pull request, @mariajgrimaldi! Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket as you can:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
df6bf6a
to
d2bf397
Compare
94a83e3
to
dd73e24
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2196 +/- ##
========================================
Coverage 95.05% 95.06%
========================================
Files 193 193
Lines 21149 21285 +136
Branches 1906 1918 +12
========================================
+ Hits 20104 20235 +131
- Misses 780 785 +5
Partials 265 265
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
49a679f
to
78549ab
Compare
I still need to generate all the translations so tests don't fail. Meanwhile, I'll open the PR for review. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanttV: I think I addressed all your comments. Could you review again?
I still need to fix the JS tests, so I'll work on that. Thanks!
Hi @mariajgrimaldi, I have tested again and everything seems to work correctly. Only a minor change would be needed. Thank you! |
0c1362b
to
a6bae46
Compare
aa96648
to
436ffce
Compare
Hello, @pomegranited! I hope you're doing well. Could you give us a hand reviewing this PR? We'd appreciate it. Regarding the test failures, I'll be working on increasing the PR coverage this week. I hope to update the PR soon. Thanks! |
436ffce
to
ced373a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mariajgrimaldi , thank you for this feature! I think it'll be useful for content authors.
I noted a few minor nits, and had a question about rounding, but nothing that's blocking merge. If you agree with my suggested changes, could you apply them and do a version bump? I can merge tomorrow.
👍
- I tested this on my tutor dev stack with:
- the flag disabled (by default) -- no option provided, defaulted to median
- the flag enabled -- options provided, changed to "mean", was not able to change after block was published
- I read through the code
- I checked for accessibility issues by using my keyboard to navigate the ORA editor.
- Includes documentation
- User-facing strings are extracted for translation
total_criterion_scores = len(scores) | ||
if total_criterion_scores == 0: | ||
return 0 | ||
return math.ceil(sum(scores) / float(total_criterion_scores)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This threw me for a loop. So, I tried replicating the same scenario. Here's the criteria I used:
Ideas: poor=0, fair=3, good=5
Content: poor=0, fair=1, good=3, excellent=5
Graded by 5 configuration
And got:
While researching, I found that the assessments are retrieved, ordered, and then truncated by the number of peer grades needed. Could you confirm that the number of peer grades configured while testing was 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added an extra test here with the exact values: https://github.com/openedx/edx-ora2/pull/2196/files#diff-7fe9b2681acaf897f2dd62a13553badf288fd6144832ce0562c4a58bd2640e7fR2381
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also reproduced the same scenario with the graded-by-3 configurations and got 2 since the 1st three peers graded 1, 0, and 3 in that order. Please let me know if the submission order also affected your tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, submission order does affect the final score! That's annoying.
I also added an extra test here with the exact values: https://github.com/openedx/edx-ora2/pull/2196/files#diff-7fe9b2681acaf897f2dd62a13553badf288fd6144832ce0562c4a58bd2640e7fR2381
Thanks for that -- I also ran this test with the numbers shuffled in a different order, to confirm that it's not the "mean score" logic that's causing this issue. It must be something done after that.
--- a/openassessment/assessment/test/test_peer.py
+++ b/openassessment/assessment/test/test_peer.py
@@ -2375,6 +2375,7 @@ class TestPeerApi(CacheResetTest):
self.assertEqual(31, Assessment.get_mean_score([5, 6, 12, 16, 22, 53, 102]))
self.assertEqual(31, Assessment.get_mean_score([16, 6, 12, 102, 22, 53, 5]))
self.assertEqual(2, Assessment.get_mean_score([0, 1, 3, 1, 5]))
+ self.assertEqual(2, Assessment.get_mean_score([5, 3, 1, 1, 0]))
Could you confirm that the number of peer grades configured while testing was 5?
Yes, 5 peer grades were required, and only 5 were received.
I used the default graded-by-5 configuration, with
Ideas: poor=0, fair=3, good=5
Content: poor=0, fair=1, good=3, excellent=5
Audit user received the same "Content" peer assessments as Honor, but in a different order, and so they got different scores. (Apologies about the "Ideas" assessments, I thought I did them all the same too, but apparently not.)
@mariajgrimaldi I've re-tested with your changes and, with the exception of the ordering issue discussed above, I found no other issues. Redwood gets cut soon, so I'm happy to merge this now, and then you can do a follow-up PR to fix the ordering bug? We'll also need a version bump to get this into edx-platform:
|
@pomegranited: thank you again for such a detailed review! We appreciate it. I just bumped version number in openassessment/init.py and package.json, and I'll be researching further the issue with the grading ordering in the next few days. I'll let you know what I find! |
@mariajgrimaldi 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
Awesome, thank you for your rapid reply @mariajgrimaldi ! ora2==6.10.0 is released and ready for you to do a version bump PR in edx-platform if you want. |
TL;DR - This PR adds a new grading calculation strategy that can be configured in the assessment steps configurations for peer steps instead of the median calculation.
Currently, for steps that calculate a final score based on multiple scores (e.g. peer steps) do so using the median by default; the calculation is done here. Using the median for this use case has its advantages. However, using it by default without reconfigurability options removes other use cases that can be done using the mean calculation for the final score instead. This also opens the door for other grading strategies mentioned in this thread.
That's why this product proposal was created and later approved.
What changed?
get_score
functionget_score
gets the median score of all scores the student has at the moment of generation by using get_assessments_median_scoresget_assessments_median_scores
callsget_median_scores_dict
, which calculates the final score for the student using the median.So we can make the grading strategy configurable we:
Changed each reference to
get_assessments_median_scores
for a more genericget_assessments_peer_scores_with_grading_strategy
which discriminates based on the workflow configuration which calculation to use. If the feature flagENABLE_ORA_PEER_CONFIGURABLE_GRADING
is off, then the defaultget_assessments_median_scores
is used.By default, the median calculations are used.
Developer Checklist
Testing Instructions
FEATURES["ENABLE_ORA_PEER_CONFIGURABLE_GRADING"] = True
Student 1 submits the ORA activity with a distinctive answer so we can distinguish it later
Student 2 scores Student 1 with 0
Student 3 scores Student 1 with 1
Student 4 scores Student 1 with 5
Now, the median is: 1 but the mean is: 2. So check that grading matches the mean value and the explanation says mean instead of median.
If you don't configure the mean grading then the default behavior is kept:
Reviewer Checklist
Collectively, these should be completed by reviewers of this PR: